Multiple Speaker Tracking and Detection: Handset Normalization and Duration Scoring
نویسندگان
چکیده
Sönmez, Kemal, Heck, Larry, and Weintraub, Mitchel, Multiple Speaker Tracking and Detection: Handset Normalization and Duration Scoring, Digital Signal Processing 10 (2000), 133–142. We describe SRI’s speaker tracking and detection system in the NIST 1998 Speaker Detection and Tracking Development Evaluation. The system is designed for tracking switchboard conversations and uses a twospeaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single genderand handset-independent imposter model distribution. Speaker tracking is used to segment waveforms for speaker detection, which is carried out by averaging frame scores of the Viterbi path and normalizing for handset variation via a novel parameter interpolation extension of HNORM for use with waveform segments of arbitrary lengths. A short-duration penalty to augment the acoustic scores is also introduced via a nonlinear combination function. Results on the NIST 1998 Speaker Detection and Tracking Development Evaluation dataset are reported. 2000 Academic Press
منابع مشابه
Speaker tracking and detection with multiple speakers
We describe a speaker tracking and detection system, for Switchboard conversations, that uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single genderand handset-independent imposter model distribution. Speaker tracking is used to segment speakers for detection, which is carried...
متن کاملThe 1999 NIST speaker recognition evaluation, using summed two-channel telephone data for speaker detection and speaker tracking
The 1999 NIST Speaker Recognition Evaluation encompassed three tasks: one-speaker detection, two-speaker detection, and speaker tracking. All tasks were performed in the context of conversational telephone speech. The one-speaker task used single channel mu-law data; the other tasks used summed twochannel data. Twelve sites from the United States, Europe, and India participated in the evaluatio...
متن کاملModel selection and score normalization for text-dependent single utterance speaker verification
In this paper, we investigate model selection and channel variability issues on a text-dependent single utterance (TDSU) speaker verification application. Due to the lack of an appropriate database for the task, a multichannel speaker recognition database, which consists of multiple recordings of a single Turkish utterance, is collected. The first set of experiments is devoted to model selectio...
متن کاملThe NIST 1999 Speaker Recognition Evaluation - An Overview
This article summarizes the 1999 NIST Speaker Recognition Evaluation. It discusses the overall research objectives, the three task definitions, the development and evaluation data sets, the specified performance measures and their manner of presentation, the overall quality of the results. More than a dozen sites from the United States, Europe, and Asia participated in this evaluation. There we...
متن کاملTelephone-based Text-dependent Speaker Verification
TELEPHO E-BASED TEXT-DEPE DE T SPEAKER VERIFICATIO In this thesis, we investigate model selection and channel variability issues on telephone-based text-dependent speaker verification applications. Due to the lack of an appropriate database for the task, we collected two multi-channel speaker recognition databases which are referred to as text-dependent variable text (TDVT-D) and textdependent ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Digital Signal Processing
دوره 10 شماره
صفحات -
تاریخ انتشار 2000